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In their paper, Nicolae, Meng and Kong (hence- 
forth NMK) propose several very interesting meth- 
ods for quantifying the fraction of missing informa- 
tion in a sample, and focus their attention on genetic 
studies. Survival analysis is another area in statis- 
tics where missing information plays an important 
role. Here, censoring complicates study design, for 
example when we want to determine how big a clin- 
ical trial should be in order to have a good chance of 
detecting a treatment effect in a Cox model. Most 
current methods for dealing with this difficult prob- 
lem involve two stages, where in the first stage we 
make a projection of what the variance of the co- 
efficient of the treatment effect would be if there 
was no censoring, and in the second stage we make 
a correction to adjust for the censoring. Often this 
is done under restrictive parametric (e.g., exponen- 
tial) assumptions for the underlying distributions. It 
would be desirable to use the methods proposed by 
NMK in the survival analysis setting. I tried to carry 
over their methods to the Cox model, and encoun- 
tered some problems. The difficulties I discovered 
led me to consider modifications of their proposals, 
which I believe work well. Below I discuss the setup 
I consider, my experiences, the issues, and some ap- 
proaches I think are promising. 

1. SURVIVAL STUDIES FOR ASSESSING THE 
EFFICACY OF A NEW TREATMENT 

A typical clinical trial with a survival outcome in- 
volves a fixed time frame, say five years. Patients en- 
ter the trial continuously during the first four years, 
are randomly assigned to treatment or control, and 
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the last year is a followup year, during which no pa- 
tients enter the study. Some patients die during the 
study, in which case their survival time is observed. 
But some patients die from other causes or are lost 
to followup, and some are still alive at the time the 
trial is ended; so in these cases the survival time 
is censored: for each individual in this group, there 
is a time t and we know only that the individual's 
survival is greater than t. 

Clearly the censoring reduces information regard- 
ing the efficacy of the new treatment. When de- 
signing a subsequent study in the hope of getting 
stronger evidence against the null hypothesis of no 
treatment effect, we now have two choices: increase 
the number of patients in the study, which can be 
expensive, or try to reduce the censoring. We can re- 
duce the censoring either by putting more resources 
into followup, or by extending the length of the pe- 
riod of time after the end of the accrual period. 
These result in costs which are financial and also 
ethical because increasing the length of the final fol- 
lowup period postpones publication of results that 
are of potential benefit to other patients. The deci- 
sion of whether to increase the number of patients 
or to reduce the censoring depends crucially on the 
amount of information loss due to censoring, so be- 
ing able to measure this is extremely important in 
the design of future studies. This situation is very 
similar to the one discussed by NMK. 

By far the most commonly used model for regres- 
sion with censored survival data is the Cox pro- 
portional hazards model. Suppose that individual i 
has covariate vector Z{ = (Zn, . . . , Zi p ), where Zn is 
the indicator that the individual receives the treat- 
ment. Let Xi be the death time of individual i if 
there was no censoring, and let Y{ be the censoring 
time. For each individual, we observe the minimum 
Tj =min(Xj,Yi) and also the indicator Si that Xi 
was not censored, that is, Si = I(X{ < Yj). So the 
data for individual i is the triple (Ti,Si,Zi). 

The proportional hazards model stipulates that 
the hazard rate for an individual with covariate vec- 
tor Z is given by 

(1) X(t\Z) = X (t)exp(p'Z), 
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where j3 is a p-dimensional vector of coefficients, and 
Ao is the hazard function for an individual with co- 
variate vector 0. For our purposes (as will be clear 
later), it is preferable to define the model in terms 
of cumulative hazard functions, and so by integrat- 
ing (1), the model is stated by specifying that A(t\Z), 
the cumulative hazard function for an individual 
with covariate Z, is related to Ao(t), the cumulative 
hazard function for an individual with covariate 
via 

(2) A(t\Z)=A (t)eMP'Z). 

The model is parameterized by 9 = (Ao, /3), in which 
Ao is considered a nuisance parameter. The likeli- 
hood function is very complex, and involves both Ao 
and f3. Cox's partial likelihood (Cox, 1972, 1975) — 
literally just a part of the full likelihood; see Efron 
(1977) — involves only j3. 

2. MEASURING THE RELATIVE 
INFORMATION IN THE DATA 

There is a large literature that shows that Cox's 
partial likelihood has the main features of an ordi- 
nary likelihood: the maximum partial likelihood es- 
timator /3 is consistent and asymptotically normal 
(Andersen and Gill, 1982), and there are several pa- 
pers (Efron, 1977; Oakes, 1977) that show that infer- 
ence based on this partial likelihood is essentially as 
good as inference based on the full likelihood. Stan- 
dard software gives the partial likelihood function. 
For example in R, if we fit a Cox model to a data 
set and call the result f itcox, then f itcox$loglik 
gives the log of the partial likelihood, evaluated at 
any desired value of /3, and also at the maximum 
partial likelihood estimate of (3. 

These considerations suggest that we use the par- 
tial likelihood function as a likelihood in the measure 
IZIi that NMK propose. For a data set D, let Id(P) 
denote the log partial likelihood function based on 
D. Let Dob denote the observed data, and D co de- 
note the full data, had we been able to see it. Sup- 
pose we wish to test the null hypothesis that f3 = (3q. 
If we use the partial likelihood, the numerator of IZIi 
is simply £D oh (f3) — £o ob (/3o), and the denominator 
is 

(3) E § {£ Dco 0) | D oh } - E § {£ Dco (f3 ) \ D oh }. 

In (3), D co is random and has the conditional distri- 
bution of the complete data given the observed data, 
and the subscript 6 indicates that this conditional 



distribution is computed under the assumption that 
9 is the true value of 9. Here, the maximum likeli- 
hood estimator of 9 is 9 = (Aq,(3), where Ao is the 
Nelson-Aalen estimator of Ao- This expectation is 
hopelessly difficult to compute. However, it is pos- 
sible to estimate it via Monte Carlo, and the last 
section of this article details how to do this. 

To assess the performance of this measure I con- 
sidered the "acute myelogenous leukemia data" and 
some perturbations of it. This data set is given in 
Miller (1981, page 49), and is available in the survival 
package in R. There are 11 individuals receiving the 
new treatment (Z = 0), of whom four have censored 
survival times, and 12 individuals receiving the stan- 
dard treatment (Z = 1), of whom one has a censored 
survival time. We are interested in testing the null 
hypothesis that (3 = 0, indicating no treatment ef- 
fect. 

Table 1 gives three versions of this data set, of 
which the first is the original data set. Dataset aml- 

I is a perturbed version in which (i) all the status 
indicators 5i that were were changed to 1 and (ii) 

II observations, all censored at time 0, were added 
to the new treatment group, and 12 observations, 
all censored at time 0, were added to the standard 
treatment group. The inclusion of these 23 new ob- 
servations all censored at time doubles the size of 
the data set but adds no information whatsoever, 
and any reasonable method for estimating the rel- 
ative information in the data should give 0.5. This 
is the censored data analogue of the example of un- 
observed Bernoullis in Section 1.3 of NMK. Dataset 
am 1-2 is a perturbed version of the original data set 
in which 11 observations, all censored at time 0, are 
added to the new treatment group, and 12 obser- 
vations, all censored at time 0, are added to the 
standard treatment group; but the original part of 
the data set was not altered. 

The results are given in line 1 of Table 2. They are 
surprising. The value of 7ZI\ for the original data 
set is 0.987, suggesting that there is essentially no 
missing information, even though 5 of the 23 ob- 
servations are censored; and for aml-1, the value is 
0.552 whereas it should be 0.5, or at least very close 
to 0.5; and what is more worrisome is that for am 1-2 
it is bigger than for aml-1, even though aml-2 has 
more missing data. In fact, it is not even true that 
1ZI\ is always less than 1. (A particular instance of 
this phenomenon arises when dealing with the data 
set veteran, available in the survival package in R, 
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Table 1 

Three versions of the leukemia data 
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Notation of the sort ... indicates a string of 11 0's. 



when testing whether the treatment effect is 0, and 
ignoring all other predictors.) 

An explanation for this is as follows. The partial 
likelihood uses only the information at the times of 
the uncensored deaths (Efron, 1977), whereas the 
full likelihood also includes the information between 
successive uncensored deaths. The data used to form 
the denominator of 7ZI\ involves some censored ob- 
servations, whereas the data used to form the nu- 
merator does not. So it appears that the parts miss- 
ing from the partial likelihood are different in the 
numerator and denominator of 1ZI±. This is a very 
rare instance where using the partial likelihood cre- 
ates serious problems. The net effect is that the key 
inequality (16) in NMK fails: the inequality is based 
on using the full likelihood. Consequently the basic 
inequality 1ZI\ < 1 need not hold. 

The rationale for the criterion 1ZI\ suggests the 
following alternative way of forming the ratio of 
"evidence against the null hypothesis in the present 
sample" to the "expected value of the evidence against 
the null hypothesis if we had the complete data 
set," which bypasses the likelihood function. For a 
given method of estimating 9 and a data set D, 
let 9(D) denote the estimate based on data D, and 
let Var (9(D)) be an estimate of the covariance ma- 
trix of 9(D). Also let V oh = Var(0(D oh )) and V co be 
the matrix whose inverse is given by 



(4) 



^^^{[Var^^eo))]" 1 !^}, 



where, as before, D OD is the complete data, and D co 
is random and has the conditional distribution of 
the complete data given the observed data; and the 
subscript 6 indicates that this conditional distribu- 
tion is computed under the assumption that 9 is the 



true value of 9. We form 

(9(D oh )-9o)'V 6h 1 (9(D oh )-9 ) 



(5) KI W 



%)' Vco 1 (9 (D oh )-9 y 



(e(D ob ) 

which is a ratio of Wald-like quantities. If the di- 
mension of 9 is 1, the reciprocal of 1ZIw simplifies 
to 



(niw)- 1 = 



E f Var(g(D ob )) 
e \v^{9{D C o)) 



D 



ob 



and has the interpretation of "expected value of the 
ratio of the variance of the 9 we have to the variance 
of what 9 would be if we had the complete data." 
Motivation for (5) in general is given at the end of 
this section. 

When we apply this criterion to the example of un- 
observed Bernoullis in Section 1.3 of NMK, a short 
calculation shows that this approach gives what 7ZI\ 
gives, namely that the fraction of information in the 
sample is hq/ti (to order 1/n). 

Line 2 of Table 2 gives the value of lZI\y for the 
three versions of the leukemia data, when we esti- 
mate f3 via the maximum partial likelihood estima- 
tor, and the variance estimate is the negative sec- 
ond derivative of the log partial likelihood function 
at its maximum. The pattern we see makes sense. 
For aml-orig, which includes five partially informa- 
tive censored observations, IZIw gives a number in- 
termediate between 1 and the proportion of uncen- 
sored observations (0.783); it is almost equal to 0.5 
for aml-1, correctly reflecting the fact that the ad- 
ditional 23 points censored at give no information 
at all; and it is less than 0.5 for aml-2, which in- 
cludes not only 23 completely uninformative points, 
but also the original censored observations. It should 
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Table 2 

Monte Carlo estimates of the 1ZI\ and IZIw criteria, together with 99% confidence intervals, on three versions of the 

leukemia data 
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Each case is obtained by a Monte Carlo run of 5000 simulations, as described in Section 3, and takes about a minute to 
produce on a 3.8-GHz dual core P4 running Linux. For IZIw, the estimates are very stable: 99% confidence intervals have 
width of about 0.003. For TZh, the confidence intervals are wider. 



be noted that the variances used in the calculation 
of and V^ Q are estimated variances, and so the 
value of IZIw depends on the particular estimate 
that is used. This dependence may be noticeable in 
small samples. For instance, this is the reason why 
TZI W gives 0.490 instead of 0.5 for aml-1. Table 2 
gives results for a single experiment, but I got very 
similar results for many other data sets, including 
data sets that are bigger, have a bigger percentage 
of censored observations, or both. 
Criterion (5) has the following advantages: 

• It does not require the evaluation of a likelihood 
at some estimate. In fact, 6 need not be a max- 
imum likelihood estimator, and there need not 
even be a likelihood function. This is important 
for some situations — for example when we have a 
single randomly censored sample and we use the 
Kaplan-Meier estimate — when there is no likeli- 
hood at all. 

• It handles nuisance parameters without modifi- 
cation. That is, if 9 = (9^,9^), and the null hy- 
pothesis involves only 6^' , then we simply form (5) 

with flW and 9^ replacing 9 and 9q, etc. 

To motivate (5), suppose we are in a parametric 
framework, and recall that 1ZI\ is given by 

(6) £ Doh (9(D oh ))-£ Dob (9 ) 

E § {£ Dc MDo b ))-iD c M\D oh y 

and let us compare this to the closely related quan- 
tity 

T^w-ait 

= (9(D oh ) - 9 Q )'[-£ Doh (9(D oh ))](9(D oh ) - 9 ) 

(7) 

/{E0(D oh )-9 Q )\-i Dc MD QO ))} 

■(9(D oh )-9 )\D oh }), 

in which lu oh denotes the second derivative (with 
respect to 9) of £o oh - Consider the numerator of (6). 



Assuming standard regularity conditions, a two-term 
Taylor expansion of ^D ob (#o) around 9{D \ i ) gives 
the numerator of (7) (except for a factor of 2). If 
we expand £d co (9o) around 9{D \>) and approximate 
{ Dc MD oh )) and£ Dco (0(L> ob )) by i Dco (0(D co )) and 
£d co {9(D co )), respectively, the denominator of (6) is 
the denominator of (7) (except for a factor of 2), and 
in (7) we may take (9(D oh ) - 9 )' and (9(D oh ) - 9 ) 
outside the expectation. Expressions (5) and (7) are 
the same, except that in (5) we use an estimate of 
the inverse variance that is not necessarily given by 
the negative observed Fisher information. 

3. GENERATING A COMPLETE DATA SET 

Let S(t\Z) be the survival function for an individ- 
ual with covariate vector Z . The proportional haz- 
ards model may be reformulated as 

(8) S(t\Z) = (So(t)r xp ^' z \ 

where So is the survival function for an individ- 
ual with covariate vector 0. Models (2) and (8) are 
equivalent in the continuous case, for which the sur- 
vival function and corresponding cumulative hazard 
function are related via S(t) = exp(— A(i)). In gen- 
eral, (2) and (8) are not the same, and it is important 
to decide on the specification of the Cox model, and 
here we take (8) as our definition. There are rea- 
sons why (8) is more sensible; see Kalbfleisch and 
Prentice (1980, Section 4.6). 

For an individual with covariate 0, the survival 
function and the cumulative hazard function are re- 
lated via the product integral So(t) = Yl s<t (l — Ao(cLs)) 
(Gill and Johansen, 1990), so by (8) the survival 
function for an individual with covariate Z is given 
by 

(9) S(t\Z) = {U s < t (l-Ao(ds))} cx ^' Z \ 

Suppose that the survival time for individual i is 
censored, that is, we observe Tj and Zi and we know 
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that Xi > Ti. We form S(t\Zi) by substituting Ao 
and /3 for Ao and /3 in (9), and generate Xi from this 
distribution conditional on its being greater than 
Tj. We do this for all censored observations, and 
the expectations in (3) and (4) can be estimated by 
Monte Carlo. Standard software gives Ao and the 
corresponding So, so this scheme is easy to carry 
out. R functions to implement this scheme and to 
calculate the criteria TZI\ and IZIw are available 
from me upon request. 
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